{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# MVP价值主张宣言\n",
    "\n",
    "\n",
    "## PRD1 数据加值宣言\n",
    "\n",
    "- 本项目产出通过scrapy按xpath挖掘亚马逊网站**图书**商品信息，\n",
    "- [amazon网站链接](https://www.amazon.com/)\n",
    "- [图书类目链接](https://www.amazon.com/b?node=283155)\n",
    "- 参数及关键词包括价格，商品名，图书类别，详情链接，评星总数，以对亚马逊网站上图书商品或某类商品销售等提供研究和分析的数据\n",
    "- 关键词：艺术与摄影、商业管理、计算机与互联网等\n",
    "- 页数：每个关键词下有70页以上，每页有20条左右的内容\n",
    "\n",
    "## PRD2：MVP的数据加值\n",
    "\n",
    "- MVP：该数据挖掘的最小可行性价值在于对图书类别商品的相关销售数据进行收集，可实现数据分析及不同图书商品间的对比，解决对消费者喜好的研究，行业信息的挖掘等的问题。\n",
    "- 解决问题：可对亚马逊网站上销售的图书类型、价格及用户喜爱程度进行分析、对比 \n",
    "\n",
    "## 参数设计\n",
    "- 该数据产品的数据类型：已通过Excel表格输出\n",
    "- 参数：该数据挖掘项目中包含的参数有，价格，商品名，图书类别，详情链接，评星总数，以对亚马逊网站上图书商品或某类商品销售等提供研究和分析的数据\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 数据挖掘过程"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 新建scrapy项目"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "New Scrapy project 'amazon', using template directory 'f:\\anaconda3\\lib\\site-packages\\scrapy\\templates\\project', created in:\n",
      "    F:\\学习及工作文档\\web数据\\web\\amazon\n",
      "\n",
      "You can start your first spider with:\n",
      "    cd amazon\n",
      "    scrapy genspider example example.com\n"
     ]
    }
   ],
   "source": [
    "!scrapy startproject amazon"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 进入文件目录"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "F:\\学习及工作文档\\web数据\\web\\amazon\n"
     ]
    }
   ],
   "source": [
    "cd web\\amazon"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 创建spider文件"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Spider 'mobile' already exists in module:\n",
      "  amazon.spiders.mobile\n"
     ]
    }
   ],
   "source": [
    "!scrapy genspider mobile amazon.com"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 运行srapy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['2,082', '1,687', '337', '4,255', '673', '437', '7,627', '4,446', '2,477', '1,374', '644', '119', '358', '492', '2,860']\n",
      "['692', '2,008', '2,087', '1,571', '182', '2,500', '67', '3,168', '271', '994', '351', '2,340', '2,257', '847', '37', '2,730']\n",
      "['931', '298', '1,893', '434', '1,200', '1,551', '1,406', '1,245', '366', '446', '74', '19,699', '693', '377', '542', '873']\n",
      "['1,514', '228', '208', '762', '517', '557', '647', '191', '175', '888', '671', '1,035', '692', '54', '769', '528']\n",
      "['2,318', '4,543', '2,369', '771', '962', '45', '458', '846', '35', '808', '884', '1,454', '819', '1,876', '277', '265']\n",
      "['809', '996', '544', '2,451', '425', '788', '585', '125', '2,604', '1,971', '756', '6,221', '405', '1,698', '93', '148']\n",
      "['625', '153', '13,101', '464', '674', '518', '691', '5', '542', '407', '2,275', '1,443', '316', '433', '418', '401']\n",
      "['800', '207', '1,430', '949', '1', '1,192', '1,295', '1,516', '309', '629', '516', '436', '466', '1,361', '895', '336']\n",
      "['1,304', '482', '208', '3,273', '1,365', '822', '54', '2,447', '450', '1,194', '267', '935', '773', '328', '1,588', '63']\n",
      "['598', '170', '162', '1,088', '247', '715', '256', '1,979', '1,021', '1,149', '314', '121', '429', '574', '594']\n",
      "['698', '1,028', '251', '1,185', '205', '266', '3,053', '1,150', '160', '737', '457', '239', '547', '368', '351', '1,859']\n",
      "['366', '818', '658', '483', '1,356', '1,225', '348', '52', '371', '390', '359', '1,076', '2,029', '427', '412', '742']\n",
      "['3,948', '1,485', '24', '1,402', '145', '1,054', '96', '218', '238', '724', '987', '3,163', '263', '654', '4,616', '137']\n",
      "['303', '16', '53', '106', '1,070', '567', '221', '421', '53', '2,332', '31', '646', '613', '3,013', '370', '619']\n",
      "['714', '62', '580', '411', '216', '574', '237', '368', '1,013', '142', '304', '119', '23', '1,449', '279', '1,231']\n",
      "['88', '417', '738', '683', '1,340', '414', '3,031', '142', '843', '664', '798', '178', '241', '1,184', '80', '1,078']\n",
      "['816', '930', '324', '552', '6', '514', '73', '1,581', '611', '67', '1,689', '261', '824', '253', '815', '179']\n",
      "['172', '443', '234', '588', '105', '37', '144', '631', '341', '153', '167', '60', '410', '257', '65', '244']\n",
      "['752', '827', '314', '98', '439', '236', '208', '1,462', '341', '118', '418', '2,455', '132', '310', '92', '715']\n",
      "['155', '2,195', '376', '931', '48', '76', '331', '298', '187', '276', '78', '4,099', '222', '78', '20', '94']\n",
      "['564', '215', '500', '71', '226', '392', '129', '389', '48', '233', '1,227', '3,995', '357', '460', '760', '192']\n",
      "['568', '696', '60', '311', '118', '1,040', '171', '212', '177', '698', '643', '306', '636', '260', '154', '649']\n",
      "['822', '123', '226', '439', '215', '356', '562', '13', '575', '513', '430', '523', '3', '374', '168', '1,490']\n",
      "['135', '803', '571', '1,043', '170', '91', '98', '1', '468', '77', '403', '3,663', '174', '1,114', '303', '153']\n",
      "['101', '1,614', '159', '734', '360', '78', '328', '83', '50', '55', '519', '731', '528', '372', '833', '124']\n",
      "['353', '580', '151', '605', '176', '770', '432', '15', '546', '224', '261', '604', '124', '59', '11', '780']\n",
      "['88', '423', '351', '483', '543', '141', '244', '1,873', '206', '33', '364', '124', '1,796', '380', '855', '63']\n",
      "['579', '392', '176', '1,194', '804', '350', '178', '86', '722', '106', '215', '307', '234', '467', '153', '1,750']\n",
      "['124', '255', '421', '2', '27', '2,019', '252', '379', '106', '276', '283', '889', '150', '663', '1,119', '213']\n",
      "['474', '134', '757', '839', '223', '545', '291', '387', '340', '421', '776', '304', '212', '372', '384', '4']\n",
      "['61', '636', '654', '903', '88', '176', '407', '513', '679', '479', '2,287', '209', '22', '675', '661', '166']\n",
      "['119', '270', '353', '542', '494', '316', '1,130', '333', '356', '4', '201', '547', '238', '2,307', '183', '1,605']\n",
      "['337', '840', '204', '584', '45', '2,300', '1,827', '65', '314', '187', '540', '472', '49', '246', '159', '230']\n",
      "['146', '1,154', '114', '379', '41', '31', '376', '102', '538', '555', '528', '43', '975', '217', '1,036', '150']\n",
      "['356', '102', '218', '9', '1,277', '747', '65', '411', '666', '22', '438', '373', '102', '101', '136', '270']\n",
      "['315', '1,649', '298', '276', '429', '384', '128', '34', '5', '44', '64', '278', '370', '54', '376', '581']\n",
      "['266', '326', '495', '361', '332', '283', '190', '91', '1,816', '60', '177', '1', '390', '649', '113', '85']\n",
      "['91', '135', '703', '81', '411', '166', '214', '269', '236', '344', '169', '482', '424', '402', '47', '360']\n",
      "['387', '103', '258', '101', '57', '531', '68', '199', '3', '109', '15', '224', '60', '157', '55']\n",
      "['136', '662', '932', '395', '50', '520', '1,075', '678', '569', '185', '121', '248', '233', '93', '236', '370']\n",
      "['102', '348', '840', '76', '137', '642', '335', '117', '105', '1', '271', '333', '193', '150', '26', '142']\n",
      "['92', '375', '541', '83', '125', '18', '286', '352', '162', '505', '1,223', '519', '4', '294', '78', '620']\n",
      "['154', '430', '43', '213', '834', '196', '35', '97', '225', '86', '620', '213', '376', '9', '52', '229']\n",
      "['164', '5,558', '133', '151', '483', '6', '1,590', '445', '427', '829', '14', '25', '257', '25', '90']\n",
      "['1,265', '206', '31', '876', '429', '431', '23', '251', '151', '544', '44', '1,773', '355', '23', '94']\n",
      "['150', '73', '138', '95', '123', '905', '752', '201', '94', '649', '613', '160', '93', '124', '117', '158']\n",
      "['385', '553', '74', '87', '642', '46', '50', '476', '147', '334', '65', '762', '365', '256', '287', '222']\n",
      "['1,758', '100', '1,894', '47', '354', '201', '204', '700', '13', '545', '130', '173', '148', '88', '649', '133']\n",
      "['270', '300', '340', '72', '138', '516', '353', '387', '215', '37', '141', '21', '278', '380', '505']\n",
      "['167', '185', '51', '73', '619', '76', '318', '202', '132', '195', '1', '640', '374', '22', '141', '131']\n",
      "['150', '59', '162', '197', '437', '200', '357', '735', '223', '137', '1,723', '3', '691', '98', '130', '67']\n",
      "['274', '4', '348', '102', '92', '452', '563', '282', '33', '431', '29', '252', '586', '28', '238']\n",
      "['24', '614', '502', '641', '67', '369', '1,461', '36', '1,047', '71', '132', '117', '543', '60', '350', '1,162']\n",
      "['520', '352', '82', '52', '238', '156', '556', '139', '403', '715', '528', '87', '868', '89', '18', '99']\n",
      "['94', '218', '260', '197', '199', '458', '410', '252', '325', '143', '184', '2,190', '219', '223', '1,024']\n",
      "['904', '203', '105', '410', '329', '320', '18', '484', '11', '108', '397', '407', '188', '33', '354', '100']\n",
      "['131', '109', '192', '4', '941', '78', '678', '1,248', '114', '794', '68', '627', '122', '86', '274', '166']\n",
      "['262', '439', '111', '262', '54', '394', '43', '166', '120', '144', '177', '199', '15', '238', '346', '475']\n",
      "['63', '121', '23', '115', '85', '266', '780', '221', '448', '131', '1,377', '220', '493', '449', '2', '244']\n",
      "['217', '132', '292', '124', '87', '121', '810', '53', '328', '146', '553', '221', '2,014', '62', '192', '97']\n",
      "['261', '6', '14', '77', '464', '342', '351', '613', '114', '396', '51', '35', '98', '1,288', '129', '45']\n",
      "['197', '42', '654', '345', '131', '234', '329', '113', '94', '391', '329', '13', '946', '166', '286', '841']\n",
      "['181', '6', '114', '143', '44', '608', '265', '172', '455', '142', '61', '58', '319', '349', '26', '2']\n",
      "['1,458', '1,021', '313', '104', '99', '222', '146', '594', '2,409', '148', '179', '122', '454', '250', '226', '124']\n",
      "['211', '647', '307', '191', '129', '707', '313', '205', '325', '106', '239', '228', '601', '198', '904', '55']\n",
      "['33', '30', '2', '54', '503', '312', '191', '353', '294', '611', '139', '1,006', '568', '290', '22', '124']\n",
      "['690', '67', '178', '39', '339', '1,203', '934', '98', '225', '95', '217', '33', '425', '322', '4', '210']\n",
      "['210', '97', '155', '506', '134', '122', '623', '166', '239', '275', '452', '863', '759', '186', '751', '83']\n",
      "['117', '115', '75', '71', '433', '433', '31', '83', '161', '269', '234', '506', '85', '45', '91', '73']\n",
      "['150', '131', '579', '780', '659', '586', '104', '54', '267', '94', '16', '158', '34', '195', '105', '96']\n",
      "['315', '463', '178', '360', '171', '244', '348', '3,665', '345', '133', '209', '209', '311', '3', '565', '176']\n",
      "['123', '333', '316', '209', '74', '73', '55', '89', '177', '294', '845', '84', '150', '10', '47', '178']\n",
      "['86', '3,328', '17', '16', '227', '269', '144', '226', '107', '16', '205', '405', '291', '60', '78', '66']\n",
      "['142', '213', '229', '32', '152', '67', '138', '116', '156', '36', '6', '24', '75', '189', '662']\n",
      "[]\n"
     ]
    }
   ],
   "source": [
    "! scrapy crawl mobile"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "#输出json文件 ! scrapy crawl mobile -o output.json"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 查看导出的数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import json"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>herf</th>\n",
       "      <th>price</th>\n",
       "      <th>mark</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Homebody: A Guide to Creating Spaces You Never...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Homebody-Guide-Cre...</td>\n",
       "      <td>US$19.99</td>\n",
       "      <td>2,082</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>100 Amazing Patterns: An Adult Coloring Book w...</td>\n",
       "      <td>https://www.amazon.com/-/zh/100-Amazing-Patter...</td>\n",
       "      <td>US$9.99</td>\n",
       "      <td>1,687</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Hustle Harder, Hustle Smarter</td>\n",
       "      <td>https://www.amazon.com/-/zh/dp/B07Z9JKFF2/ref=...</td>\n",
       "      <td>US$0.00</td>\n",
       "      <td>337</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Lettering and Modern Calligraphy: A Beginner's...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Lettering-Modern-C...</td>\n",
       "      <td>US$14.99</td>\n",
       "      <td>4,255</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Tom Ford</td>\n",
       "      <td>https://www.amazon.com/-/zh/Tom-Ford/dp/084782...</td>\n",
       "      <td>US$17.05</td>\n",
       "      <td>673</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1098</th>\n",
       "      <td>Black Acting Methods</td>\n",
       "      <td>https://www.amazon.com/-/zh/Black-Acting-Metho...</td>\n",
       "      <td>US$10.72</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1099</th>\n",
       "      <td>Tom Scheerer: More Decorating</td>\n",
       "      <td>https://www.amazon.com/-/zh/Tom-Scheerer-More-...</td>\n",
       "      <td>US$111.26</td>\n",
       "      <td>24</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1100</th>\n",
       "      <td>The Understanding by Design Guide to Creating ...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Understanding-Desi...</td>\n",
       "      <td>US$33.99</td>\n",
       "      <td>75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1101</th>\n",
       "      <td>Mary Engelbreit 2021 Monthly/Weekly Planner Ca...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Engelbreit-Monthly...</td>\n",
       "      <td>US$16.99</td>\n",
       "      <td>189</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1102</th>\n",
       "      <td>Descendants 2 A Wickedly Cool Coloring Book (A...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Descendants-Wicked...</td>\n",
       "      <td>US$13.76</td>\n",
       "      <td>662</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1103 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                  title  \\\n",
       "0     Homebody: A Guide to Creating Spaces You Never...   \n",
       "1     100 Amazing Patterns: An Adult Coloring Book w...   \n",
       "2                         Hustle Harder, Hustle Smarter   \n",
       "3     Lettering and Modern Calligraphy: A Beginner's...   \n",
       "4                                              Tom Ford   \n",
       "...                                                 ...   \n",
       "1098                               Black Acting Methods   \n",
       "1099                      Tom Scheerer: More Decorating   \n",
       "1100  The Understanding by Design Guide to Creating ...   \n",
       "1101  Mary Engelbreit 2021 Monthly/Weekly Planner Ca...   \n",
       "1102  Descendants 2 A Wickedly Cool Coloring Book (A...   \n",
       "\n",
       "                                                   herf      price   mark  \n",
       "0     https://www.amazon.com/-/zh/Homebody-Guide-Cre...   US$19.99  2,082  \n",
       "1     https://www.amazon.com/-/zh/100-Amazing-Patter...    US$9.99  1,687  \n",
       "2     https://www.amazon.com/-/zh/dp/B07Z9JKFF2/ref=...    US$0.00    337  \n",
       "3     https://www.amazon.com/-/zh/Lettering-Modern-C...   US$14.99  4,255  \n",
       "4     https://www.amazon.com/-/zh/Tom-Ford/dp/084782...   US$17.05    673  \n",
       "...                                                 ...        ...    ...  \n",
       "1098  https://www.amazon.com/-/zh/Black-Acting-Metho...   US$10.72      6  \n",
       "1099  https://www.amazon.com/-/zh/Tom-Scheerer-More-...  US$111.26     24  \n",
       "1100  https://www.amazon.com/-/zh/Understanding-Desi...   US$33.99     75  \n",
       "1101  https://www.amazon.com/-/zh/Engelbreit-Monthly...   US$16.99    189  \n",
       "1102  https://www.amazon.com/-/zh/Descendants-Wicked...   US$13.76    662  \n",
       "\n",
       "[1103 rows x 4 columns]"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3 = pd.read_json('amazon.json',lines=True)\n",
    "df3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [],
   "source": [
    "df4=df3.drop_duplicates(['title','herf','price','mark']) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>herf</th>\n",
       "      <th>price</th>\n",
       "      <th>mark</th>\n",
       "      <th>分类</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Homebody: A Guide to Creating Spaces You Never...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Homebody-Guide-Cre...</td>\n",
       "      <td>US$19.99</td>\n",
       "      <td>2,082</td>\n",
       "      <td>艺术与摄影</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>100 Amazing Patterns: An Adult Coloring Book w...</td>\n",
       "      <td>https://www.amazon.com/-/zh/100-Amazing-Patter...</td>\n",
       "      <td>US$9.99</td>\n",
       "      <td>1,687</td>\n",
       "      <td>艺术与摄影</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Hustle Harder, Hustle Smarter</td>\n",
       "      <td>https://www.amazon.com/-/zh/dp/B07Z9JKFF2/ref=...</td>\n",
       "      <td>US$0.00</td>\n",
       "      <td>337</td>\n",
       "      <td>艺术与摄影</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Lettering and Modern Calligraphy: A Beginner's...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Lettering-Modern-C...</td>\n",
       "      <td>US$14.99</td>\n",
       "      <td>4,255</td>\n",
       "      <td>艺术与摄影</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Tom Ford</td>\n",
       "      <td>https://www.amazon.com/-/zh/Tom-Ford/dp/084782...</td>\n",
       "      <td>US$17.05</td>\n",
       "      <td>673</td>\n",
       "      <td>艺术与摄影</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1098</th>\n",
       "      <td>Black Acting Methods</td>\n",
       "      <td>https://www.amazon.com/-/zh/Black-Acting-Metho...</td>\n",
       "      <td>US$10.72</td>\n",
       "      <td>6</td>\n",
       "      <td>艺术与摄影</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1099</th>\n",
       "      <td>Tom Scheerer: More Decorating</td>\n",
       "      <td>https://www.amazon.com/-/zh/Tom-Scheerer-More-...</td>\n",
       "      <td>US$111.26</td>\n",
       "      <td>24</td>\n",
       "      <td>艺术与摄影</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1100</th>\n",
       "      <td>The Understanding by Design Guide to Creating ...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Understanding-Desi...</td>\n",
       "      <td>US$33.99</td>\n",
       "      <td>75</td>\n",
       "      <td>艺术与摄影</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1101</th>\n",
       "      <td>Mary Engelbreit 2021 Monthly/Weekly Planner Ca...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Engelbreit-Monthly...</td>\n",
       "      <td>US$16.99</td>\n",
       "      <td>189</td>\n",
       "      <td>艺术与摄影</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1102</th>\n",
       "      <td>Descendants 2 A Wickedly Cool Coloring Book (A...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Descendants-Wicked...</td>\n",
       "      <td>US$13.76</td>\n",
       "      <td>662</td>\n",
       "      <td>艺术与摄影</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1103 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                  title  \\\n",
       "0     Homebody: A Guide to Creating Spaces You Never...   \n",
       "1     100 Amazing Patterns: An Adult Coloring Book w...   \n",
       "2                         Hustle Harder, Hustle Smarter   \n",
       "3     Lettering and Modern Calligraphy: A Beginner's...   \n",
       "4                                              Tom Ford   \n",
       "...                                                 ...   \n",
       "1098                               Black Acting Methods   \n",
       "1099                      Tom Scheerer: More Decorating   \n",
       "1100  The Understanding by Design Guide to Creating ...   \n",
       "1101  Mary Engelbreit 2021 Monthly/Weekly Planner Ca...   \n",
       "1102  Descendants 2 A Wickedly Cool Coloring Book (A...   \n",
       "\n",
       "                                                   herf      price   mark  \\\n",
       "0     https://www.amazon.com/-/zh/Homebody-Guide-Cre...   US$19.99  2,082   \n",
       "1     https://www.amazon.com/-/zh/100-Amazing-Patter...    US$9.99  1,687   \n",
       "2     https://www.amazon.com/-/zh/dp/B07Z9JKFF2/ref=...    US$0.00    337   \n",
       "3     https://www.amazon.com/-/zh/Lettering-Modern-C...   US$14.99  4,255   \n",
       "4     https://www.amazon.com/-/zh/Tom-Ford/dp/084782...   US$17.05    673   \n",
       "...                                                 ...        ...    ...   \n",
       "1098  https://www.amazon.com/-/zh/Black-Acting-Metho...   US$10.72      6   \n",
       "1099  https://www.amazon.com/-/zh/Tom-Scheerer-More-...  US$111.26     24   \n",
       "1100  https://www.amazon.com/-/zh/Understanding-Desi...   US$33.99     75   \n",
       "1101  https://www.amazon.com/-/zh/Engelbreit-Monthly...   US$16.99    189   \n",
       "1102  https://www.amazon.com/-/zh/Descendants-Wicked...   US$13.76    662   \n",
       "\n",
       "         分类  \n",
       "0     艺术与摄影  \n",
       "1     艺术与摄影  \n",
       "2     艺术与摄影  \n",
       "3     艺术与摄影  \n",
       "4     艺术与摄影  \n",
       "...     ...  \n",
       "1098  艺术与摄影  \n",
       "1099  艺术与摄影  \n",
       "1100  艺术与摄影  \n",
       "1101  艺术与摄影  \n",
       "1102  艺术与摄影  \n",
       "\n",
       "[1103 rows x 5 columns]"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df4[\"分类\"] = '艺术与摄影'\n",
    "df4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [],
   "source": [
    "df4.to_excel(\"amazon_book1.xlsx\", sheet_name=\"艺术与摄影\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=3&language=zh&qid=1595129554&ref=sr_pg_2\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=4&language=zh&qid=1595129558&ref=sr_pg_3\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=5&language=zh&qid=1595129563&ref=sr_pg_4\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=6&language=zh&qid=1595129567&ref=sr_pg_5\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=7&language=zh&qid=1595129571&ref=sr_pg_6\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=8&language=zh&qid=1595129578&ref=sr_pg_7\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=9&language=zh&qid=1595129583&ref=sr_pg_8\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=10&language=zh&qid=1595129587&ref=sr_pg_9\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=11&language=zh&qid=1595129591&ref=sr_pg_10\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=12&language=zh&qid=1595129596&ref=sr_pg_11\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=13&language=zh&qid=1595129600&ref=sr_pg_12\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=14&language=zh&qid=1595129605&ref=sr_pg_13\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=15&language=zh&qid=1595129610&ref=sr_pg_14\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=16&language=zh&qid=1595129615&ref=sr_pg_15\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=17&language=zh&qid=1595129619&ref=sr_pg_16\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=18&language=zh&qid=1595129623&ref=sr_pg_17\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=19&language=zh&qid=1595129628&ref=sr_pg_18\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=20&language=zh&qid=1595129632&ref=sr_pg_19\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=21&language=zh&qid=1595129637&ref=sr_pg_20\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=22&language=zh&qid=1595129641&ref=sr_pg_21\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=23&language=zh&qid=1595129646&ref=sr_pg_22\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=24&language=zh&qid=1595129651&ref=sr_pg_23\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=25&language=zh&qid=1595129657&ref=sr_pg_24\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=26&language=zh&qid=1595129662&ref=sr_pg_25\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=27&language=zh&qid=1595129666&ref=sr_pg_26\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=28&language=zh&qid=1595129671&ref=sr_pg_27\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=29&language=zh&qid=1595129675&ref=sr_pg_28\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=30&language=zh&qid=1595129680&ref=sr_pg_29\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=31&language=zh&qid=1595129685&ref=sr_pg_30\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=32&language=zh&qid=1595129690&ref=sr_pg_31\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=33&language=zh&qid=1595129697&ref=sr_pg_32\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=34&language=zh&qid=1595129701&ref=sr_pg_33\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=35&language=zh&qid=1595129706&ref=sr_pg_34\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=36&language=zh&qid=1595129711&ref=sr_pg_35\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=37&language=zh&qid=1595129715&ref=sr_pg_36\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=38&language=zh&qid=1595129719&ref=sr_pg_37\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=39&language=zh&qid=1595129724&ref=sr_pg_38\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=40&language=zh&qid=1595129728&ref=sr_pg_39\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=41&language=zh&qid=1595129734&ref=sr_pg_40\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=42&language=zh&qid=1595129739&ref=sr_pg_41\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=43&language=zh&qid=1595129743&ref=sr_pg_42\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=44&language=zh&qid=1595129747&ref=sr_pg_43\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=45&language=zh&qid=1595129753&ref=sr_pg_44\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=46&language=zh&qid=1595129757&ref=sr_pg_45\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=47&language=zh&qid=1595129764&ref=sr_pg_46\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=48&language=zh&qid=1595129768&ref=sr_pg_47\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=49&language=zh&qid=1595129773&ref=sr_pg_48\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=50&language=zh&qid=1595129778&ref=sr_pg_49\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=51&language=zh&qid=1595129782&ref=sr_pg_50\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=52&language=zh&qid=1595129787&ref=sr_pg_51\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=53&language=zh&qid=1595129793&ref=sr_pg_52\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=54&language=zh&qid=1595129798&ref=sr_pg_53\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=55&language=zh&qid=1595129802&ref=sr_pg_54\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=56&language=zh&qid=1595129806&ref=sr_pg_55\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=57&language=zh&qid=1595129811&ref=sr_pg_56\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=58&language=zh&qid=1595129815&ref=sr_pg_57\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=59&language=zh&qid=1595129819&ref=sr_pg_58\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=60&language=zh&qid=1595129824&ref=sr_pg_59\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=61&language=zh&qid=1595129828&ref=sr_pg_60\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=62&language=zh&qid=1595129833&ref=sr_pg_61\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=63&language=zh&qid=1595129838&ref=sr_pg_62\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=64&language=zh&qid=1595129843&ref=sr_pg_63\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=65&language=zh&qid=1595129848&ref=sr_pg_64\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=66&language=zh&qid=1595129856&ref=sr_pg_65\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=67&language=zh&qid=1595129860&ref=sr_pg_66\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=68&language=zh&qid=1595129865&ref=sr_pg_67\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=69&language=zh&qid=1595129870&ref=sr_pg_68\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=70&language=zh&qid=1595129875&ref=sr_pg_69\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=71&language=zh&qid=1595129882&ref=sr_pg_70\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=72&language=zh&qid=1595129887&ref=sr_pg_71\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=73&language=zh&qid=1595129892&ref=sr_pg_72\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=74&language=zh&qid=1595129897&ref=sr_pg_73\n",
      "https://www.amazon.com/-/zh/s?i=stripbooks&rh=n%3A283155%2Cn%3A1000%2Cn%3A5&page=75&language=zh&qid=1595129901&ref=sr_pg_74\n",
      "https://www.amazon.com\n",
      "https://www.amazon.com\n"
     ]
    }
   ],
   "source": [
    "! scrapy crawl mobile #运行并显示页数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>herf</th>\n",
       "      <th>price</th>\n",
       "      <th>mark</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Daring Greatly: How the Courage to Be Vulnerab...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Daring-Greatly-Cou...</td>\n",
       "      <td>US$0.00</td>\n",
       "      <td>2,447</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Limitless: Upgrade Your Brain, Learn Anything ...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Limitless-Upgrade-...</td>\n",
       "      <td>US$12.99</td>\n",
       "      <td>704</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Outliers: The Story of Success</td>\n",
       "      <td>https://www.amazon.com/-/zh/Outliers-Story-Suc...</td>\n",
       "      <td>US$12.99</td>\n",
       "      <td>7,780</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>知识项目管理机构指南（PMBOK® 指南）– 第六版</td>\n",
       "      <td>https://www.amazon.com/-/zh/Project-Management...</td>\n",
       "      <td>US$0.00</td>\n",
       "      <td>1,176</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Thinking, Fast and Slow</td>\n",
       "      <td>https://www.amazon.com/-/zh/Thinking-Fast-Slow...</td>\n",
       "      <td>US$0.00</td>\n",
       "      <td>8,035</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1173</th>\n",
       "      <td>Feck Perfuction: Dangerous Ideas on the Busine...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Feck-Perfuction-Da...</td>\n",
       "      <td>US$14.46</td>\n",
       "      <td>156</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1174</th>\n",
       "      <td>The Power of Full Engagement: Managing Energy,...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Power-Full-Engagem...</td>\n",
       "      <td>US$13.99</td>\n",
       "      <td>433</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1175</th>\n",
       "      <td>The Innovators: How a Group of Hackers, Genius...</td>\n",
       "      <td>https://www.amazon.com/-/zh/The-Innovators-Wal...</td>\n",
       "      <td>US$14.99</td>\n",
       "      <td>1,367</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1176</th>\n",
       "      <td>HBR's 10 Must Reads On Strategy</td>\n",
       "      <td>https://www.amazon.com/-/zh/HBRs-10-Must-Reads...</td>\n",
       "      <td>US$45.12</td>\n",
       "      <td>248</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1177</th>\n",
       "      <td>The Strangest Secret</td>\n",
       "      <td>https://www.amazon.com/-/zh/Strangest-Secret-E...</td>\n",
       "      <td>US$0.00</td>\n",
       "      <td>912</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1178 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                  title  \\\n",
       "0     Daring Greatly: How the Courage to Be Vulnerab...   \n",
       "1     Limitless: Upgrade Your Brain, Learn Anything ...   \n",
       "2                        Outliers: The Story of Success   \n",
       "3                            知识项目管理机构指南（PMBOK® 指南）– 第六版   \n",
       "4                               Thinking, Fast and Slow   \n",
       "...                                                 ...   \n",
       "1173  Feck Perfuction: Dangerous Ideas on the Busine...   \n",
       "1174  The Power of Full Engagement: Managing Energy,...   \n",
       "1175  The Innovators: How a Group of Hackers, Genius...   \n",
       "1176                    HBR's 10 Must Reads On Strategy   \n",
       "1177                               The Strangest Secret   \n",
       "\n",
       "                                                   herf     price   mark  \n",
       "0     https://www.amazon.com/-/zh/Daring-Greatly-Cou...   US$0.00  2,447  \n",
       "1     https://www.amazon.com/-/zh/Limitless-Upgrade-...  US$12.99    704  \n",
       "2     https://www.amazon.com/-/zh/Outliers-Story-Suc...  US$12.99  7,780  \n",
       "3     https://www.amazon.com/-/zh/Project-Management...   US$0.00  1,176  \n",
       "4     https://www.amazon.com/-/zh/Thinking-Fast-Slow...   US$0.00  8,035  \n",
       "...                                                 ...       ...    ...  \n",
       "1173  https://www.amazon.com/-/zh/Feck-Perfuction-Da...  US$14.46    156  \n",
       "1174  https://www.amazon.com/-/zh/Power-Full-Engagem...  US$13.99    433  \n",
       "1175  https://www.amazon.com/-/zh/The-Innovators-Wal...  US$14.99  1,367  \n",
       "1176  https://www.amazon.com/-/zh/HBRs-10-Must-Reads...  US$45.12    248  \n",
       "1177  https://www.amazon.com/-/zh/Strangest-Secret-E...   US$0.00    912  \n",
       "\n",
       "[1178 rows x 4 columns]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_json('amazon.json',lines=True)\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "df2=df.drop_duplicates(['title','herf','price','mark']) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>herf</th>\n",
       "      <th>price</th>\n",
       "      <th>mark</th>\n",
       "      <th>分类</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Daring Greatly: How the Courage to Be Vulnerab...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Daring-Greatly-Cou...</td>\n",
       "      <td>US$0.00</td>\n",
       "      <td>2,447</td>\n",
       "      <td>经济管理</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Limitless: Upgrade Your Brain, Learn Anything ...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Limitless-Upgrade-...</td>\n",
       "      <td>US$12.99</td>\n",
       "      <td>704</td>\n",
       "      <td>经济管理</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Outliers: The Story of Success</td>\n",
       "      <td>https://www.amazon.com/-/zh/Outliers-Story-Suc...</td>\n",
       "      <td>US$12.99</td>\n",
       "      <td>7,780</td>\n",
       "      <td>经济管理</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>知识项目管理机构指南（PMBOK® 指南）– 第六版</td>\n",
       "      <td>https://www.amazon.com/-/zh/Project-Management...</td>\n",
       "      <td>US$0.00</td>\n",
       "      <td>1,176</td>\n",
       "      <td>经济管理</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Thinking, Fast and Slow</td>\n",
       "      <td>https://www.amazon.com/-/zh/Thinking-Fast-Slow...</td>\n",
       "      <td>US$0.00</td>\n",
       "      <td>8,035</td>\n",
       "      <td>经济管理</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1173</th>\n",
       "      <td>Feck Perfuction: Dangerous Ideas on the Busine...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Feck-Perfuction-Da...</td>\n",
       "      <td>US$14.46</td>\n",
       "      <td>156</td>\n",
       "      <td>经济管理</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1174</th>\n",
       "      <td>The Power of Full Engagement: Managing Energy,...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Power-Full-Engagem...</td>\n",
       "      <td>US$13.99</td>\n",
       "      <td>433</td>\n",
       "      <td>经济管理</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1175</th>\n",
       "      <td>The Innovators: How a Group of Hackers, Genius...</td>\n",
       "      <td>https://www.amazon.com/-/zh/The-Innovators-Wal...</td>\n",
       "      <td>US$14.99</td>\n",
       "      <td>1,367</td>\n",
       "      <td>经济管理</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1176</th>\n",
       "      <td>HBR's 10 Must Reads On Strategy</td>\n",
       "      <td>https://www.amazon.com/-/zh/HBRs-10-Must-Reads...</td>\n",
       "      <td>US$45.12</td>\n",
       "      <td>248</td>\n",
       "      <td>经济管理</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1177</th>\n",
       "      <td>The Strangest Secret</td>\n",
       "      <td>https://www.amazon.com/-/zh/Strangest-Secret-E...</td>\n",
       "      <td>US$0.00</td>\n",
       "      <td>912</td>\n",
       "      <td>经济管理</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1178 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                  title  \\\n",
       "0     Daring Greatly: How the Courage to Be Vulnerab...   \n",
       "1     Limitless: Upgrade Your Brain, Learn Anything ...   \n",
       "2                        Outliers: The Story of Success   \n",
       "3                            知识项目管理机构指南（PMBOK® 指南）– 第六版   \n",
       "4                               Thinking, Fast and Slow   \n",
       "...                                                 ...   \n",
       "1173  Feck Perfuction: Dangerous Ideas on the Busine...   \n",
       "1174  The Power of Full Engagement: Managing Energy,...   \n",
       "1175  The Innovators: How a Group of Hackers, Genius...   \n",
       "1176                    HBR's 10 Must Reads On Strategy   \n",
       "1177                               The Strangest Secret   \n",
       "\n",
       "                                                   herf     price   mark    分类  \n",
       "0     https://www.amazon.com/-/zh/Daring-Greatly-Cou...   US$0.00  2,447  经济管理  \n",
       "1     https://www.amazon.com/-/zh/Limitless-Upgrade-...  US$12.99    704  经济管理  \n",
       "2     https://www.amazon.com/-/zh/Outliers-Story-Suc...  US$12.99  7,780  经济管理  \n",
       "3     https://www.amazon.com/-/zh/Project-Management...   US$0.00  1,176  经济管理  \n",
       "4     https://www.amazon.com/-/zh/Thinking-Fast-Slow...   US$0.00  8,035  经济管理  \n",
       "...                                                 ...       ...    ...   ...  \n",
       "1173  https://www.amazon.com/-/zh/Feck-Perfuction-Da...  US$14.46    156  经济管理  \n",
       "1174  https://www.amazon.com/-/zh/Power-Full-Engagem...  US$13.99    433  经济管理  \n",
       "1175  https://www.amazon.com/-/zh/The-Innovators-Wal...  US$14.99  1,367  经济管理  \n",
       "1176  https://www.amazon.com/-/zh/HBRs-10-Must-Reads...  US$45.12    248  经济管理  \n",
       "1177  https://www.amazon.com/-/zh/Strangest-Secret-E...   US$0.00    912  经济管理  \n",
       "\n",
       "[1178 rows x 5 columns]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2[\"分类\"] = '经济管理'\n",
    "df2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "df2.to_excel(\"amazon_book.xlsx\", sheet_name=\"经济管理\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Scrapy 2.1.0 - no active project\n",
      "\n",
      "Unknown command: crawl\n",
      "\n",
      "Use \"scrapy\" to see available commands\n"
     ]
    }
   ],
   "source": [
    "! scrapy crawl mobile"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>herf</th>\n",
       "      <th>price</th>\n",
       "      <th>mark</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Hands-On Machine Learning with Scikit-Learn, K...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Hands-Machine-Lear...</td>\n",
       "      <td>US$20.55</td>\n",
       "      <td>427</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Keto Meal Prep Cookbook For Beginners: 600 Eas...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Keto-Meal-Prep-Coo...</td>\n",
       "      <td>US$37.49</td>\n",
       "      <td>566</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>设计数据密集型应用程序： 《可靠、可扩展和可维护系统背后的伟大思想》</td>\n",
       "      <td>https://www.amazon.com/-/zh/Designing-Data-Int...</td>\n",
       "      <td>US$10.99</td>\n",
       "      <td>532</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Small Teaching Online: Applying Learning Scien...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Small-Teaching-Onl...</td>\n",
       "      <td>US$0.00</td>\n",
       "      <td>29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Minecraft: Guide Collection 4-Book Boxed Set: ...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Minecraft-Collecti...</td>\n",
       "      <td>US$14.99</td>\n",
       "      <td>506</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1161</th>\n",
       "      <td>The Everything Learning Brazilian Portuguese B...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Everything-Learnin...</td>\n",
       "      <td>US$14.18</td>\n",
       "      <td>149</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1162</th>\n",
       "      <td>The Walt Disney Studios: A Lot to Remember (Di...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Walt-Disney-Studio...</td>\n",
       "      <td>US$16.64</td>\n",
       "      <td>24</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1163</th>\n",
       "      <td>Buddy Readers (Parent Pack): Level C: 20 Level...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Buddy-Readers-Pare...</td>\n",
       "      <td>US$10.96</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1164</th>\n",
       "      <td>Persona 3, Vol. 1</td>\n",
       "      <td>https://www.amazon.com/-/zh/Persona-3-Vol-1-At...</td>\n",
       "      <td>US$22.09</td>\n",
       "      <td>28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1165</th>\n",
       "      <td>Exploring Microsoft Office 2016 Volume 1 (Expl...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Exploring-Microsof...</td>\n",
       "      <td>US$54.98</td>\n",
       "      <td>63</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1166 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                  title  \\\n",
       "0     Hands-On Machine Learning with Scikit-Learn, K...   \n",
       "1     Keto Meal Prep Cookbook For Beginners: 600 Eas...   \n",
       "2                    设计数据密集型应用程序： 《可靠、可扩展和可维护系统背后的伟大思想》   \n",
       "3     Small Teaching Online: Applying Learning Scien...   \n",
       "4     Minecraft: Guide Collection 4-Book Boxed Set: ...   \n",
       "...                                                 ...   \n",
       "1161  The Everything Learning Brazilian Portuguese B...   \n",
       "1162  The Walt Disney Studios: A Lot to Remember (Di...   \n",
       "1163  Buddy Readers (Parent Pack): Level C: 20 Level...   \n",
       "1164                                  Persona 3, Vol. 1   \n",
       "1165  Exploring Microsoft Office 2016 Volume 1 (Expl...   \n",
       "\n",
       "                                                   herf     price mark  \n",
       "0     https://www.amazon.com/-/zh/Hands-Machine-Lear...  US$20.55  427  \n",
       "1     https://www.amazon.com/-/zh/Keto-Meal-Prep-Coo...  US$37.49  566  \n",
       "2     https://www.amazon.com/-/zh/Designing-Data-Int...  US$10.99  532  \n",
       "3     https://www.amazon.com/-/zh/Small-Teaching-Onl...   US$0.00   29  \n",
       "4     https://www.amazon.com/-/zh/Minecraft-Collecti...  US$14.99  506  \n",
       "...                                                 ...       ...  ...  \n",
       "1161  https://www.amazon.com/-/zh/Everything-Learnin...  US$14.18  149  \n",
       "1162  https://www.amazon.com/-/zh/Walt-Disney-Studio...  US$16.64   24  \n",
       "1163  https://www.amazon.com/-/zh/Buddy-Readers-Pare...  US$10.96    8  \n",
       "1164  https://www.amazon.com/-/zh/Persona-3-Vol-1-At...  US$22.09   28  \n",
       "1165  https://www.amazon.com/-/zh/Exploring-Microsof...  US$54.98   63  \n",
       "\n",
       "[1166 rows x 4 columns]"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_json('amazon.json',lines=True)\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "df2=df.drop_duplicates(['title','herf','price','mark']) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>herf</th>\n",
       "      <th>price</th>\n",
       "      <th>mark</th>\n",
       "      <th>分类</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Hands-On Machine Learning with Scikit-Learn, K...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Hands-Machine-Lear...</td>\n",
       "      <td>US$20.55</td>\n",
       "      <td>427</td>\n",
       "      <td>计算机与互联网</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Keto Meal Prep Cookbook For Beginners: 600 Eas...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Keto-Meal-Prep-Coo...</td>\n",
       "      <td>US$37.49</td>\n",
       "      <td>566</td>\n",
       "      <td>计算机与互联网</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>设计数据密集型应用程序： 《可靠、可扩展和可维护系统背后的伟大思想》</td>\n",
       "      <td>https://www.amazon.com/-/zh/Designing-Data-Int...</td>\n",
       "      <td>US$10.99</td>\n",
       "      <td>532</td>\n",
       "      <td>计算机与互联网</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Small Teaching Online: Applying Learning Scien...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Small-Teaching-Onl...</td>\n",
       "      <td>US$0.00</td>\n",
       "      <td>29</td>\n",
       "      <td>计算机与互联网</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Minecraft: Guide Collection 4-Book Boxed Set: ...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Minecraft-Collecti...</td>\n",
       "      <td>US$14.99</td>\n",
       "      <td>506</td>\n",
       "      <td>计算机与互联网</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1161</th>\n",
       "      <td>The Everything Learning Brazilian Portuguese B...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Everything-Learnin...</td>\n",
       "      <td>US$14.18</td>\n",
       "      <td>149</td>\n",
       "      <td>计算机与互联网</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1162</th>\n",
       "      <td>The Walt Disney Studios: A Lot to Remember (Di...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Walt-Disney-Studio...</td>\n",
       "      <td>US$16.64</td>\n",
       "      <td>24</td>\n",
       "      <td>计算机与互联网</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1163</th>\n",
       "      <td>Buddy Readers (Parent Pack): Level C: 20 Level...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Buddy-Readers-Pare...</td>\n",
       "      <td>US$10.96</td>\n",
       "      <td>8</td>\n",
       "      <td>计算机与互联网</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1164</th>\n",
       "      <td>Persona 3, Vol. 1</td>\n",
       "      <td>https://www.amazon.com/-/zh/Persona-3-Vol-1-At...</td>\n",
       "      <td>US$22.09</td>\n",
       "      <td>28</td>\n",
       "      <td>计算机与互联网</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1165</th>\n",
       "      <td>Exploring Microsoft Office 2016 Volume 1 (Expl...</td>\n",
       "      <td>https://www.amazon.com/-/zh/Exploring-Microsof...</td>\n",
       "      <td>US$54.98</td>\n",
       "      <td>63</td>\n",
       "      <td>计算机与互联网</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1166 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                  title  \\\n",
       "0     Hands-On Machine Learning with Scikit-Learn, K...   \n",
       "1     Keto Meal Prep Cookbook For Beginners: 600 Eas...   \n",
       "2                    设计数据密集型应用程序： 《可靠、可扩展和可维护系统背后的伟大思想》   \n",
       "3     Small Teaching Online: Applying Learning Scien...   \n",
       "4     Minecraft: Guide Collection 4-Book Boxed Set: ...   \n",
       "...                                                 ...   \n",
       "1161  The Everything Learning Brazilian Portuguese B...   \n",
       "1162  The Walt Disney Studios: A Lot to Remember (Di...   \n",
       "1163  Buddy Readers (Parent Pack): Level C: 20 Level...   \n",
       "1164                                  Persona 3, Vol. 1   \n",
       "1165  Exploring Microsoft Office 2016 Volume 1 (Expl...   \n",
       "\n",
       "                                                   herf     price mark  \\\n",
       "0     https://www.amazon.com/-/zh/Hands-Machine-Lear...  US$20.55  427   \n",
       "1     https://www.amazon.com/-/zh/Keto-Meal-Prep-Coo...  US$37.49  566   \n",
       "2     https://www.amazon.com/-/zh/Designing-Data-Int...  US$10.99  532   \n",
       "3     https://www.amazon.com/-/zh/Small-Teaching-Onl...   US$0.00   29   \n",
       "4     https://www.amazon.com/-/zh/Minecraft-Collecti...  US$14.99  506   \n",
       "...                                                 ...       ...  ...   \n",
       "1161  https://www.amazon.com/-/zh/Everything-Learnin...  US$14.18  149   \n",
       "1162  https://www.amazon.com/-/zh/Walt-Disney-Studio...  US$16.64   24   \n",
       "1163  https://www.amazon.com/-/zh/Buddy-Readers-Pare...  US$10.96    8   \n",
       "1164  https://www.amazon.com/-/zh/Persona-3-Vol-1-At...  US$22.09   28   \n",
       "1165  https://www.amazon.com/-/zh/Exploring-Microsof...  US$54.98   63   \n",
       "\n",
       "           分类  \n",
       "0     计算机与互联网  \n",
       "1     计算机与互联网  \n",
       "2     计算机与互联网  \n",
       "3     计算机与互联网  \n",
       "4     计算机与互联网  \n",
       "...       ...  \n",
       "1161  计算机与互联网  \n",
       "1162  计算机与互联网  \n",
       "1163  计算机与互联网  \n",
       "1164  计算机与互联网  \n",
       "1165  计算机与互联网  \n",
       "\n",
       "[1166 rows x 5 columns]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2[\"分类\"] = '计算机与互联网'\n",
    "df2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "df2.to_excel(\"amazon_book2.xlsx\", sheet_name=\"计算机与互联网\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 心得总结及感谢：\n",
    "1. 心得：通过本学期数据挖掘课程的学习，我从认识html及xpath开始，到使用xpath定位挖掘参数，并学习到request_html模块,实现URL解析，翻页等等。通过实践猎聘网，微信公众号，知网，必应图片等数据挖掘项目，对数据挖掘有了初步的认识。\n",
    "2. 感谢：\n",
    "    - 在本次项目中参考了网络同类型网站爬取的文章内容 \n",
    "\t- [Pandas处理json文件](https://www.jianshu.com/p/d27b72178b70)\n",
    "\t- [scrapy + xpath 爬取amazon商品信息](https://www.jianshu.com/p/1762c4cfa17b)\n",
    "\t- 学习scrapy项目建立过程中参考的相关教程\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
